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Abstract. We derive explicit upper bounds for the d-distance between a chain of infinite 
order and its canonical fc-steps Markov approximation. Our proof is entirely construc- 
tive and involves a "coupling from the past" argument. The new method covers non 
necessarily continuous probability kernels, and chains with null transition probabilities. 
These results imply in particular the Bernoulli property for these processes. 



1. INTRODUCTION 

Chains of infinite order are random processes that are specified by probability kernels 
(conditional probabilities), which may depend on the whole past. They provide a flexible 
model that is very useful in different areas of applied probability and statistics, from 



bioinformatics Bejerano & Yona (2001); Busch et al. (2009) to linguistics Galves et al. 



(2010, 2009). They are also models of considerable theoretical interest in ergodic theory 



Coelho & Quas (1998); Hulse (1991); Quas (1996); 



theory of stochastic process Bramson & Kalikow (1993); Comets et al. (2002); Fernandez & 



Maillard ( 2005 ) . A natural approach to study chains of infinite order is to approximate the 



Walters (2007) and in the general 



original process by Markov chains of growing orders. In this article, we derive new upper- 
bounds on the d-distance between a chain and its canonical /c-steps Markov approximation. 



Introduced by Ornstein (1974) to study the isomorphism problem for Bernoulli shifts, 



the d-metric is of fundamental importance in ergodic theory where chains of infinite order 
are also known as g-measures. The (i-distance between two processes can be informally 
described as the minimal proportion of times we have to change a typical realization of one 



process in order to obtain a typical realization of the other. Ornstein (1974) showed that 



the set of processes which are measure theoretic isomorphic to Bernoulli shifts is d-closed. 
Ergodic Markov chains are examples of processes that are isomorphic to Bernoulli shifts. 
Therefore, if a process can be approximated arbitrary well under the <i-metric by a sequence 
of ergodic Markov chains, then this process has the Bernoulli property. In this article we 
prove the existence of Markov approximation schemes for classes of chains of infinite order 
with non-necessary continuous and with possibly null transition probabilities. Some of 



these processes were not considered before. For example, Coelho & Quas ( 1998 ), Fernandez 
& Galves ([2002), and Johansson et al. (2010) required the continuity of the probability 
kernels. Our results show that these new examples are isomorphic to Bernoulli shifts and 
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provide explicit upper bounds for the Markov approximation in several important cases, 
giving therefore information on how good these approximations are. 

Besides ergodic theory, the <i-distance is useful in statistics and information theory. 



Rissanen (1983) proposed to model data as realizations of stochastic chains, and proved 



that these data can be optimally compressed using the (unknown) probability kernel of 
the chain. The statistical problem is then to recover this probability kernel from the ob- 
servation of typical data. Since the number of parameters to estimate is infinite, this task 
is impossible in general. A possible strategy to overcome this problem is the following. (1) 
Couple the original chain with a Markov approximation and (2) work with the approxi- 
mating Markov chain. The d-distance between the chain and its Markov approximation 
controls the error made in step (1). The idea is that, if this control is good enough, the 
good properties of the approximating Markov chain proved in step (2) can be used to 



study the original chain. For instance, Duarte et al. (2006) and Csiszar & Talata (2010) 



derived consistency results for chains of infinite order from the consistency of BIC esti- 



mators for Markov chains proved in Csiszar & Talata ( 2006 ) . This "two steps" procedure 



was also used in Collet et al. (2005) to obtain a bootstrap central limit theorem for chains 



of infinite order from the renewal property of the approximating Markov chains. 

Our main results derive from coupling arguments. We first introduce a flexible class of 



Coupling from the past algorithms (CFTP algorithms, see Section 2.3). CFTP algorithms 



constitute an important class of perfect simulation algorithms popularized by Propp & 



Wilson (1996). Our main assumption on the chain is that the original chain of infinite 



order c an b e perfectly simulated via such CFTP algorithms. We state a technical result, 
Lemma 4.1 which provides an abstract upper bound for the d-distance with the canonical 
Markov approximation. This bound is then made explicit under various extra assumptions 



on the process used in the study of the CFTP algorithms of (Comets et al. , 2002 De Santis 



& Piccioni, 2010 Gallo 2011 Gallo & Garcia, 2011) 



To our knowledge, Fernandez & Galves (2002) provide the best explicit bounds in the 



literature for the (i-distance between a chain of infinite order and its canonical Markov 
approximation, depending only on the continuity rate of the probability kernels. Their 
result applies to weakly non-null chains having summable continuity rates. Our method 
recovers the same bounds, substituting weak non-nullness by a weaker assumption, see 



Theorem 4.1 Assuming weak non-nullness, we also obtain explicit upper bounds in some 
non-summable continuity regimes and other not even necessarily continuous, but satisfying 



certain types of localized continuity, as introduced in De Santis & Piccioni (2010), Gallo 



(2011) and Gallo & Garcia (2011). This is the content of Theorems 4.2 and 4.3 which 



provide, as far as we know, the first results for non-continuous chains. Our results should 



also be compared with the results in Johansson et al. (2010), where they prove the Bernoulli 



property for square summable continuity regime assuming strong non-nullness, although 
they don't provide an explicit upper bound for the approximations. 

The paper is organized as follows. In Section [2j we introduce the notation and basic 
definitions used all along the paper. In Section |3j we construct the coupling between the 
original chain and its canonical Markov approximation and we introduce the class of CFTP 
algorithms perfectly simulating the chains. Our main results are stated in Section |4j We 
postpone the proofs to Section [5] For convenience of the reader, we leave in Appendix 
some extensions and technical results on the "house of cards" processes that are useful in 
our applications and are of independent interest. 



MARKOV APPROXIMATION OF CHAINS OF INFINITE ORDER IN THE d-METRIC 3 

2. NOTATION, DEFINITIONS AND BACKGROUND 

2.1. Notation. We use the conventions that N* = N \ {0}, N = N* U {oo}. Let A be 
the set { 1, 2, . . . , N} for some N £ N. Given two integers m < n, let a^ be the string 
a m . . . a n of symbols in A. For any m < n, the length of the string a^ is denoted by |o^| 
and defined by n — m + 1. Let denote the empty string, of length |0| = 0. For any n6Z, 
we will use the convention that a" +1 = 0, and naturally |a™ +1 | = 0. Given two strings v 
and v' , we denote by vv' the string of length \v\ + |v'| obtained by concatenating the two 
strings. If v ' = 0, then t>0 = 0f = i>. The concatenation of strings is also extended to the 
case where v = . . . a_2 a -i is a semi-infinite sequence of symbols. If n G N* and v is a 
finite string of symbols in A, v n = v . . . v is the concatenation of n times the string v. In 
the case where n = 0, v ° is the empty string 0. Let 

+oo 

A -n = A {...,-2,-i } and A * = \J Al-*>-"-V , 

3=0 

be, respectively, the set of all infinite strings of past symbols and the set of all finite strings 
of past symbols. The case j = corresponds to the empty string 0. Finally, we denote by 
o = . . . a— 2d— i the elements of A . 

2.2. Kernels, chains and coupling. 

Definition 2.1. A family of transition probabilities, or kernel, on an alphabet A is a 
function 

P: Ax A- N ->• [0, 1] 
(a,x) I—)- P(a\x) 

such that 



V^ JDt„\^\ _ 1 U„ r- /t-N 

aeA 



P(a|x) = 1 , Vx G A" 



P is called a Markov kernel if there exists k such that P(a|x) = P(a\y) when xZ k = yZk- 
In the present paper we are mostly interested in non-Markov kernels, in which P{a\x) may 
depend on the whole past x. 

Definition 2.2. A stationary stochastic chain X = {X„} ne ^ with distribution fi on A z 
is said to be compatible with a family of transition probabilities P if the later is a regular 
version of the conditional probabilities of the former, that is 

V(X = a\Xll = x) = P{a\x) (1) 

for every a £ A and \i-a.e. x in A~^. 

If P is non-Markov, it may be hard to prove the existence of a stationary chain X com- 
patible with it. In order to solve this issue, we assume the existence of coupling from the 



past algorithms for the chain (see Section 2.3). This "constructive argument" garantees 
the existence and uniqueness of X. 

Definition 2.3 (Canonical fc-steps Markov approximation). Assume that X is a station- 
ary chain with distribution [i. The canonical /c-steps Markov approximation of X is the 



stationary k-step Markov chain X' > compatible with the kernel Ph, defined 



as 



P l u ] {a\xZl) = KXo = a\Xzl = x 
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Since /i is uniquely determined by P, we will not mention any more the subscript \i in 
PP, it will be understood that pW = pf\ 

Let us recall that a coupling between two chains X and Y taking values in the same 
alphabet A is a stochastic chain Z = {Z n } n£ z = {(X n , Y n )} n€ % on (A x A) z such that X 
has the same distribution as X and Y has the same distribution as Y. For any pair of 
stationary chains X and Y, let C(X, Y) be the set of couplings between X and Y. 

Definition 2.4 (d-distance). The d-distance between two stationary chains X and Y is 
defined by 

d(X,Y)= inf P(Xq^Yo). 

(X,Y)eC(X,Y) 

For the class of ergodic processes, this distance has another interpretation which is more 
intuitive: it is the minimal proportion of sites we have to change in a typical realization 
of X in order to obtain a typical realization of Y. Formally, 

1 n 
d(X,Y)= inf lim -Vl{X 4 /yj. 

(x,Y)eC(x,Y) n->+oo n *ri 

2.3. Coupling from the past algorithm (CFTP). Our CFTP algorithm constructs 
a sample of the stationary chain compatible with a given kernel P, using a sequence 
U = {U n } n £z of i.i.d. random variables uniformly distributed in [0, 1[. We denote by 
(Q, J 7 , IP) the probability space associated to U. The CFTP is completely determined by 
its update function F : A~^ U A* x [0, 1[— y A which satisfies that, for any a £ A~ N and for 
any a £ A, P(P(a, Uq) = a) = P(a\a). Using this function, we define the set of coalescence 
times and the reconstruction function <3? associated to F. For any pair of integers m, n 
such that — oo < m < n < +oo, let Fr mn \(a, U^) £ A n ~ m+1 be the sample obtained by 
applying recursively F on the fixed past a, i.e, let F^ m ^ m j(a, U m ) := F(a, U m ) and 

F{m,n}(Q>,U%) '■= F {m,n-l}(a,U^~ )F(aF{ mn _ 1 y(a,U% l ~ ),U n ) . 

Secondly, let F[ m , m ](a, U m ) := F(a, U m ) and 

F[m,n](a, Ul) = F (aF {m , n _ 1} (a, U^ 1 ), U n ) . (2) 

F\ m>n \(a, U^) is the last symbol of the sample F{ mn }(a, U^). The set 

6[n] := {j < n : F [jM (a, U?) = F [j>n] (b, Uf) for all a, b £ ^" N } (3) 

is called the set of coalescence times for the time index n. Finally, the reconstruction 
function of time n is defined by 

MXJ)] n = F [e[n>] (a,U£ [n] ) (4) 

where 9[n] is any element of &[n]. Given a kernel P, if 0[O] ^ a.s. and therefore 0[n] ^ 
a.s. for any n £ Z, then [$(U)] n is distributed according to the unique stationary measure 



compatible with P, seelDe Santis & Piccioni (2010). 



3. CONSTRUCTION OF THE COUPLING 

For any a £ yl _N , let X(a) := {I(a\aZ k )} ke ^ aeA be any partition of [0, 1[ having the 
following properties: 

(1) For any k £ N, the Lebesgue measure or length |/fc(a|al fc )| of ife(a|a~ fc ) only 
depends on a and aZ k , 
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(2) for any a and a 



^2\I k (a\aZ 1 k )\=P(a\a), 



fceN 



(3) the intervals are disposed as represented in the upper part of Figure [I] 



J O (1|0) Ji(l|a-i) 

7 O (2|0) I /i(2|a_i) 



n: 



h(l\aZl) 

l2(2\a-_\) 



4(l|alfe) 
I h(2\a-_l) 



I /oo(2|a) 



ai(ffl-i) 



a fc (ffl_i) 



lM(2|«Ci) 

Figure 1 . Illustration of a range partition related to some infinite past a. 
The upper partition is the one used for the original kernel P, whereas the 
one below is used for the approximating kernel P*- >. 



Definition 3.1. We call range partitions the partitions of [0, 1[ satisfying (1), (2) and 
(3) for some kernel P. 



The following lemma is proved in Section [5. 1| 

Lemma 3.1. A set of range partitions satisfies, for any a and a 6 A, 



Y^ \Ii(a\a~l)\ < ir£P{a\aZ\z) , VA; > 



i=0 



Given a range partition 1(a), the following F is an update function, due to property (2). 



F(a,U ):=Y,a.llu £{Jl k (a\aZl)}. (5) 

This function F explains the name "range partition" : for a given past a, when the uniform 
r.v. Uq belongs to UaeA Ui=o ^*( a l a -j)' then F constructs a symbol looking at a range 
< k in the past. 
Let L : A~ N U A* x [0, 1[— > {0, 1,2,.. .} be the range function defined by 



E 

fceN 



L(a,u) := y.k.l{u £ D a &Ah(a\a_ k )} 



(6) 



L associates to a past o and a real number u £ [0, 1[ the length of the suffix of a that F 
needs in order to construct the next symbol when Uq = u. 



Using these functions, define, as in Section 2.3, the related coalescence sets 8[j], i £ Z, and 



the reconstruction function 3>(U), which is distributed according to the unique stationary 
distribution compatible with P whenever 0[O] is a.s. non-empty. 
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Let us now define the functions F*M and U k ' that we will use for the construction 
of "X\ k i. Observe that, on the one hand, by definition of the canonical /c-steps Markov 
approximation we have for any a G A and a~ fc G A 

P [k] (a\aZl) ■= V( X = a \ X ~l = a -D = I P(a\aZlzW(z\aZl) > inf P{a\aZ\z) . 

J A -N " ' Z 



On the other hand, by Lemma 3.1, inf^ P(a\aZ k z) > Ylj=o l-^fc( a l a Ifc)l- Thus we can 
define, for any aZ k , the set of intervals {I^ k ' (a\aZ k )} a &A having length |/[ fc l(a|al fc )| = 
P\- k \a\aZ k ) — X^7=o \Ik{a\aZ k )\ and disposed as in Figure 1 The functions F*-> and La> 

(7) 



are defined as follows 



FW(aZl,U ) := J>l{C/ G U k =0 I,(a\aZ]) U I [k] (a^)} 

a£A 



and 



L^(aZlUo) := J>1{*7 G U oS ^(o|al))} + k.l{U G U a€A lW(a\aZ k )}. (8) 

3=0 



Using these functions, define, as in Section 2.3, the related coalescence sets G^ [i], i G Z, 
and the reconstruction function $' '(U), which is distributed according to the unique sta- 
tionary distribution compatible with P' > whenever G' > [0] is a.s. non-empty. 

Using the same sequence of uniforms U and assuming that G[0] and G^[0] are a.s. 
non-empty, (3?(U),3>^(U)) is a (A x A)-valued chain with coordinates distributed as X 
and Xl fc l respectively. It follows that (<3?(U), $11 (U)) is a coupling between both chains. 
Hence, we have constructed a CFTP algorithm for perfect simulation of the coupled chains. 



4. STATEMENTS OF THE RESULTS 

4.1. A key lemma. Let us first state a technical lemma that is central in the proof of 
our main results. 

Lemma 4.1. Assume that there exists a set of range partitions {X(a)} a such that the sets 
of coalescence times G[0] n G[ fc ][0] is F-a.s. non-empty. Then, for any 9[0] G G[0], 

J(x,xW)<p[U U {^(^{^^-nfe^fo] 1 ),^) >k} 

V Q. i=e\0] 



(9) 



where, for i = 6[0], the event reads {L (a, t^[o]) > k}. 

Examples of range partitions satisfying the conditions of this theorem have already 



been built, for example in Comets et al. (2002), Gallo (2011), Gallo & Garcia (2011) 



and De Santis & Piccioni (2010). These works assume some regularity conditions on P 



and some non-nullness hypothesis which are presented in Sections 4.2[ 4.3. 4.4. In these 
sections, we derive explicit upper bounds for ([9]) under the respective assumptions. Before 
that, let us give an interesting remark on Bernoullicity. 

Observation 4.1 (A remark on Bernoullicity). In the conditions of each works cited 
above, we will exhibit 9[0] G G[0] which belongs to Q' '[0] for any sufficiently large k's, and 
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we will prove that 



U U {L(aF {9[0] ^ 1} (a,U^)^)>k}) k ^fO. 



(10) 



It follows, by Lemma 4-l> that 



lim J(X,XW) = 0. 



k~ >oo 



We also have, for any sufficiently large k 's, that Xl 1 is an ergodic Markov chain since 
0' ' [0] is non-empty. Now, by the d-closure of the set of processes isomorphic to a Bernoulli 



shift (see for example Shields (1996) Theorem IV. 2. 10, p. 228) and the fact that ergodic 
Markov processes have the Bernoulli property (Shields (1996\) T heorem IV. 2. 10, p. 227), 
we conclude that the processes considered in Comets et al. (2002 ), Gallo (2011), Gallo & 
Garcia (2011) and De Santis & Piccioni (2010) have the Bernoulli property. 



4.2. Kernels with summable continuity rate. Let us first define continuity. 

Definition 4.1 (Continuity points and continuous kernels). For any k G N, a and a~Z k , 

A past a is called a continuity point for P or P is said 



let ak(a\a_l) := ini^P(a\a_l 
to be continuous in a if 



"fc(o_ fc ) : 



ak{a\a_ k ) — > 1. 



aeA 



We say that P is continuous when 

a k := inf ctk(aZ 



l\ k—>-+oo 



1. 



We say that P has summable continuity rate when X^fc>o(l 
Let us also define weak non-nullness. 



a k ) < oo. 



Definition 4.2. We say that a kernel P is weakly non-null if o.q > 0, where «o 
EaeA«o(a|0)- 



De Santis & Piccioni (2010) have introduced a more general assumption that we call very 



weak non-nullness, see Definition ^2 We postpone this definition to Section ^3 in order 
to avoid technicality at this stage. 

Theorem 4.1. Assume that P has summable continuity rate and is very weakly non-null. 
Then, there exists a constant C < +oo such that, for any sufficiently large k, 

d(X,xW) < C(l - a k ) . 

Remark 4.1. This upper bound is new since we do not assume weak non-nullness. 



Fernandez & Galves (2002) showed, under weak non-nullness, that for any sufficiently 
large k, there exists a positive constant C such that 

d(X,x[ fc ])<C73 fc 

where 

k := swp{\P(a\aZlx) - P{a\aZ\y)\ : a £ A, azl £ A k , x,y£ A^}. 

This later quantity is related to a k through the inequalities 1— a k > yft and 1 — a k < D(3 k 
for some D > 1 and sufficiently larges k 's. Moreover, 1 — a k = f3 k for binary alphabets. 



Thus, Theorem 4-1 extends the bound in Fernandez & Galves (2002) 
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4.3. Using a prior knowledge of the histories that occur. |De Santis &: Piccioni 



(2010) introduced the following assumptions on kernels. Define 

Vk > 1, J k {Uzl) ■= {x G ^" N s.t. VI < I < k, x-i = a if tf_, G J(o|0) for some a G A}, 
A := a and Vfc > 1, A fc (tr£) := m({a k (xZl) '■ * G Jfc(tCfc)} • 



Finally, let 



^C/ ^) := inf{j G N : U < Aj(Ul})}- 



(11) 
< oo, then there 



Theorem 4.2. If X /tas a kernel that satisfies IE ( IlfeX) A k (U _ k ) l 

exists a positive constant C < +oo such that 

d(X,x[ fc l) < CP(^(f/° 00 ) > k). 
In order to illustrate the interest of this result, let us give two simple examples. Other 



examples can be found in De Santis & Piccioni (2010) and Gallo & Garcia (2011). 



Summable continuity regime with weak non-nullness. Theorem 4.2 allows to re- 
cover the result of Theorem |4.1| in the weakly non-null cas e. T o see this, it is enough 
to observe that, for any UZ k , A k (UZ k ) > a k (see Definition 4.1 for a k ). It follows that 



rifc>o a fc > ° (which is equivalent to ^fc>o(l ~ a k) < +°°)> implies that Yl k>0 A k (Uzl) 
is bounded away from zero, hence, its inverse has finite expectation. Hence, Theorem |4.2 
applies and gives 



J(X,XW) < CP(£(C/° 00 ) > k) < C(l 



a k ) 



A simple discontinuous kernel on A = {1,2}. Let e G (0, 1/2) and let {pi}i>o be any 
sequence such that, e < pi < 1 — e for any i > 0. Let t(a) := inf{i > : a_i_i = 2} let P 
be the following kernel: 

VaG{l,2}- N , P(2\a)=p t( a ) . (12) 

The existence of a unique stationary chain compatible with this kernel is proven in |Gallo| 
(2011) for instance. This chain is the renewal sequence, that is, a concatenation of blocks 



of the form 1 ... 12 having random length with finite expectation. It is clearly weakly 
non-null, however, it is not necessarily continuous. In fact, a simple calculation shows 
that a k = 1 — supi m>k \pi — p m \, which needs not to go to 1. Nevertheless, if we assume 

< oo (see 



furthermore that sup k>0 a k > 1 — 2a(l)a(2), we have E f Y\ k>0 Ak(U_ 



-i\-i 



example 1 in De Santis & Piccioni (2010). We now want to derive an upper bound for 
P(£(C/ oo ) > k). First, define 

N{UZD ■= mf{n > 1 : U. n G 1(2)}, 

and observe that, A k {UZ k ) = 1 for any k > NiJJZ^- It follows that 

W(U°-oo) >k)= P(inf{j G N : U < Aj(UZJ)} > k) 

< P(Uo > A k (UzD) < nMUZl) < 1) < HNiUzi) > k) < (1 - e) k . 

Observation 4.2. The preceding theorems yield explicit upper bounds. However, they 
hold under restrictions we would like to surpass. 

First, in the continuous regime, we have assumed that X^>o(l — a k) < +oo. Nevertheless, 
CFTP are known to exist with the weaker assumption X^fe>i lli=o ai = +°°> an d it * s 
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known that d(X, X[ fc l) goes to zero in this case. We will be interested in upper bounds for 
the rate of converge nce t o zero under these weak conditions. 

Second, in Theorem 



4-2, the assumption E ( Ylk>o Ak(U_ k ) * 



< oo is generally difficult 

to check: this is particularly clear for the example of P where it requires the (not necessary) 
extra- as sumption sup fc>0 a k > 1 — 2a(l)a(2). 

The next section will solve part of these objections. 

4.4. A simple upper bound under weak non-nullness. Hereafter, we assume that 
P is weakly non-null. Let B'[0] be the following subset of 0[O]: 

6'[0] := {i < : for any a , L(oF {iJ _ 1} (a, Uf l ), Uj) < j - i , j = i, . . . , 0}. (13) 

We have the following theorem in which a priori nothing is assumed on the continuity. 

Theorem 4.3. Assume that P is weakly non-null and that we can construct a set of range 
partitions {l(a)}a for which G'[0] / 0, F-a.s. Then, for any 0[O] <= G'[0] 

d(X,XW) <P(0[O] <-Jfe). (14) 

In order to illustrate this result, let us consider the examples of continuous kernels and 
of the kernel P. Gallo & Garcia (2011) proposed a unified framework, including these 
examples and several other cases, which provides more examples of applications of this 
theorem. This is postponed to Appendix [A] in order to avoid technicality. 

Application to the continuity regime. Let us first introduce the following range par- 
tition. 

Definition 4.3. Let {2A '(a)} a be the range partition such that, for any a and a 



\/k> 0, 



l£ka\a) 



«fe(a|a_fc) -afc-i(a|o_( fe _ 1) ) , 
P(a\a) — lim afc(a|a~, ) , 

k— >oo 

with the convention a_i(a|0) = 0. 

Let F^ and L^ 1 ' be the associated functions defined in ([5j and ([6]). Let 

0[O] := max{i < : Uj < <Xj-i , j = i, . . • , 0}. 

Observe that ZA 1 ) satisfies L^'(a, Uq) < k whenever Uq < a^. Hence, 0\O] belongs to 

] (|2002|) 
proved that, if ^ fe >i Ui=o 
summable continuity) 



Q'M[0], the set defined by Q using F^ and L^\ Moreover, _ 

" " "" a,; = +oo (that is, under weak non-nullnes but not necessarily 



E 
..,tj>i 



IK 1 



tfn 2, 



«t n 



n ai 



(15) 



m=l 



1=0 



P(0[O] < -A;) < v k := J2 

t 1 + ... + t j = k 

which goes to 0. This upper bound is not very satisfactory since it is difficult to handle 
in general. Nevertheless, Propositions B.l and B.2, given in Appendix |B| shed light on 
the behavior of this vanishing sequence. In particular, under the summable continuity 
assumption Ylk>o(^ ~ a k) < +oo, Proposition B.l states that (15) essentially recovers the 



rates of Theorems |4.1| and 4.2 Also, if there exists a constant r G (0, 1) and a summable 
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sequence (sfc)fe>i such that, V& > 1, 1 — a& = r + s&, then, from Proposition B.2 there 
exists a positive constant C such that 

,(logk) 3+r 



J(x,x[ fc i) <c 



£2-(l+r) 2 



(16) 



Application to the kernel P. As a second direct application of Theorem 4.3 let us 
consider the kernel P defined in (12). Let {I^ 2 '(a)} a be the set of range partitions, such 

that |/( 2 )(2|0)| = a(2), |/ (2) (1|0)| = a(l) and if' \a\aZl) = for any k > 1 except for 



k = t(a) + 1 for which \I^\l\aZl)\ = 1 ~ Pk ~ "(1) and \I^ ] (2\aZ l k ) 



Pk -a(2). It 



satisfies 



L ( - 2 \a,U ) = (t(a) + l)l{U >a }. 



Hence, 0[O] := max{i < : U t € 7(2)} belongs to 9'[0] (9'[0] is defined by g3|) with the 
functions .F^ 2 ) and L' 2 ' obtained from the set of range partitions {I^(a)} a ). Therefore, 

<J(X, XW) < P(0[O] < -Jfe) < (1 - e) k (17) 



independently of the value sup fc>0 afc. For this simple example, Theorem 4.3 is then less 
restrictive than Theorem 14.21 



5. PROOFS OF THE RESULTS 



5.1. Proof of Lemma |3.1[ Assume that for some k > we have 

k 
^\Ii{a\a-_})\ > mi P(a\aZlz) . 



i=0 



Then, consider a past z* such that \I(a)\ + Yli=i l-^( a l a _i)l > P{ a \ a -k-)- ^ s ' f° r an 
I > k + 1, |ii(a|cCi.*C, )| > 0, we have 
fe 

^ |/ 4 (a|a:i)| + J2 \Il{a\aZ\zZ})\ > P(a\aZ\z*) ■ 

i=0 «>/c+l 

This is a contradiction with the second properties of the partition. This concludes the 
proof. □ 



5.2. Proofs of Lemma 4.1 We assume that 0[O] n 0^[O] is P-a.s. non-empty, and we 
therefore have a coupling ($(U), 3>[ fe l(U)) of both chains. By definitions of F*- k > and U k >, 
we observe that when 

L(baZl,U ) = LW(aZl,U Q ) and 



L(ba_ k , Uq) < k =4> for any b we have 



(18) 



F(baZl,U ) = F^(aZ l k ,Uo). 
Assume that, Va e yl~ N and Vi = 6>[0], . . . , 0, £(o-F{0[o],i-i}(ei> CT^™ ), C7i) < k. Then, using 
recursively Q, F {6 , [0]i0} (a, f7° [0] ) = ^{J{o] j0 }(Cfc, £/° 0] ). In particular, 0[O] G 9^(0] and 
[$(U)] = [$ [fc] (U)] . Therefore, 

d(X,xW)<P([$(U)]o/[#(U)]o) 



^ p U U { L (fi^pi.i-ijfe ^fo] 1 )' ^) > ^} 



a i= 
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5.3. Proof of Theorem 4.1 , This section is divided in three parts. First, as mentioned 



before the statement of the theorem, we define very weak non-nullness. Then, we prove 



some technical lemmas allowing to apply Lemma 4.1 Finally, we prove the theorem. 



5.3.1. Definition of very weak non-nullness. Consider the set of range partitions {2A >{a)} a 



of Definition 4.3 As observed by De Santis Sz Piccioni (2010), in the continuous case, since 



{ctfc}fc>o increases monotonically to 1, there exists k > such that a k > 0. Let k* be the 
smallest of these integers and let F* be the following update function 

F*(aZl*,U ) := P (1) (6, U a k *) V6 s.t. azl = bz\* • 

F* is well defined, since Uoctk* < a*;*, hence L^'(b, Uoctk*) < k*. In the case where k* = 0, 
F* is simply defined as 

F*(0, U Q ) := F (1) (6, U ao) = ^ l{U a G Io(o|0)} V6 . 

a&A 

Definition 5.1 (Coalescence set). For m > k* + 1, let E m , the coalescence set (different 
from the set of coalescence times), be defined as the set of all u°_ m+1 E A m such that 



u m , i \ u 



o 



F {*-fc*+l,0} I 0_fc*-P{ f -r n+ l,-fc*} I °-fc*= ~^T J » afe 7" J ^^ n ^ ^ Cnd ° n a -fc* • 

When k* = and m = 1, tye /iaue i£i := U ae A-/o(o|0). 

Definition 5.2. P^e say i/iai P is very weakly non-null if 

3m > k* + 1 s.t P(tf° m+ i G £ m ) > . (19) 

Weak non-nullness corresponds to ¥(Uq G E\) > 0, hence, it implies very weak non- 
nullness. 

5.3.2. Technical lemmas. Let O^'fO] be the set of coalescence times defined by ([3]) for the 
function F^L In a first part of the proof, we define a random time 9[0] (see (|20[)) and we 
show that it belongs to G^fO] and that it has finite expectation whenever ^fc>fc*(l — a k) < 



-oo. This random variable is defined in the proof of Theorem 2 in De Santis & Piccioni 



(2010). 



Recall that, by construction of the range partition {I^'(a)} a , for any a, L(a,Ui) = k 
whenever a^—i < Ui < a&. This means that the sequence of ranges forms a sequence 
{Li}i£z '■= {L(a,Ui)}i & z of i.i.d. N-valued r.v.'s. We now introduce two sequences of 
random times in the past, which are represented on Figure [2j in the particular case where 
k* = 2. Let 

W\ := sup{m <0 :Uj < <x,_ m+fc * , j = m, . . . , 0}, 
and for any i > 1 

Yi := inf{m < Wi : U n < a k * , n = m + 1, . . . , Wi} 

and 

W i+ i := sup{m < Yi : Uj < aj- m+k * , j = m, . . . , Yi}. 
Consider now the random variable 

Q := inf{i > 1 : (U Yi +i, ■■■, U Wi -l) G E Wi _ Yl -i} 



(see Definition 5.1 for E m ) and put 

0[O] := Y Q . (20) 
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Yn Wc 



Be 



•^5^^?sx 



Yq-x AW, 

12 



Y Wx 



-2-1 o time 



IB, 



Bx 



FIGURE 2. We consider a realization of L^^ in the particular case k* = 2, 
that is, the arrows, which represent the length function at each time index, 
have length larger or equal to 2. 

Lemma 5.1. 0[O] G 0W[O]. 

Proof. If 0[O] = —A;, then there exists some l{= —Wq -f 1) < k such that U-i < ak* 



-I 



I, . . . ,k, and moreover, U_ k G E^-i+i, that is 



771* 
^{-Z-fc' + l-Z} 



alLF, 



{-fc,-j-fc- 



■}(«:** 



i 

afc* 



-CT 



-i-fc* 






is independent of a, t 



Since U-i < a/%*, i = I, . . . ,k, it follows that 



'{— z— **-i-i,— 1> \W{-k-i-k 



> } (b,u: 



-l-k* 



,u- 



l-k*+l 



is independent of b. 



By definition of the random times Wi , all the symbols in times { Wq , . . . , } can then 
be built using those in times { Wq — k*, ... , Wq — 1 } since none of the arrows from time 
Wq until go further time Wq — k*, see the Figure [2j Therefore, the construction of the 
symbol at times does not depend on the symbols before 9[0], i.e 9[0] G 0^'[O]. □ 

Lemma 5.2. E|Wi| < +oo whenever ^fc>fc*(l ~~ a k) < +oo. 

Proof. Letting ai-k* = on for any I > k*, we have 

W\ = sup{m <0:Uj< ctj- m , j = m, . . . , 0}. 



Thus W\ is defined exactly as r[0] of display (4.2) in Comets et al.\ (2002), substituting 



their a^'s by our a^'s. They proved (see display (4.6) and item (ii) Proposition 5.1 therein) 
that E|t[0]| < +oo whenever Ylk>o(^ ~®-k) < +oo. It follows that E|Wi| < +oo whenever 
Efc>**(l-a*) < +°°- n 

Lemma 5.3. E|0[O]| < +oo whenever ^fc>fc*(^ ~~ a k) < +oo. 

Proof. As observed in De Santis & Piccioni ( |2010 ), {Wi — Yi — 1}»>i is a sequence of i.i.d. 
geometric r.v.'s with success probability 1 — a^* and {Yi — Wi + i}i>o (with Yq := 0) is a 
sequence of i.i.d. r.v.'s distributed as —W±, conditional to be non-zero. Moreover, Lemma 
5.2 states that E|Wi| < +oo. It follows that {-Bj}j>i := {li_i — Yi — 1}j>i is a sequence 



of i.i.d. N- valued r.v.'s with finite expectation. Thus Ylk=i ^i — nEB\ forms a martingale 
with respect to the filtration J~(B\, . . . ,Bi : i > 1) and we have by the optional sampling 
theorem 



E|0[O]| :=E|Fq| =E I ^Bi I = EQ.E^i < +oo. 



□ 



We finally need the following lemma. 
Lemma 5.4. For any k > k* , 0[O] G 8W'M[0]. 
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Proof. For any k > k*, F^'i" and LW'M satisfy (18). This implies that, in the interval 
{Yq, . . . , Wq — 1}, coalescence occurs as well for F^'*), i.e. U_3 Q , G ^e\o]-W ' Both 

constructed chains are equals until the first time F^ 1 ' uses a range larger than k. But 
at this moment, due to the definition of the Wi's, we have already perfectly simulated at 
least k symbols of both chains, and therefore, we can continue constructing until time 
because the ranges of i<v)>[*] are smaller of equal to k. It follows that Yq is a coalescence 
time for FW'W, and therefore, 0[O] G eW'W[0] for any k > k*. □ 



5.3.3. Proof of Theorem J^.l, By definition of F^> and ZA \ we have for any sufficiently 
large k, 

o o 

u u { l(i) Hi^Dfe^fo])'^) > k } c u^> «*>■ 

a t=0[O] i=0[O] 



By Lemmas 5.1 and 5.4 Lemma |4.1| applies and gives, for sufficiently large k's 

/ \ /|fl[0]| \ /|fl[0]| 

d(X,xW) < P (J {U 4 > a k } =P £ l{l/i > a k } > 1 < E £ 1{^ > a fc } 

\i=0[O] / \i=0 / \i=0 

where we used the Markov inequality for the last inequality. Using the fact that 0[O] is a 
stopping time in the past for the sequence Ui,i < 0, and that it has finite expectation by 
Lemma 5.3 we can apply the Wald's equality to obtain 

d(X,X [fc] ) <E|0[O]|.El{L/i >a fc } = E|0[O]|.(l-a*). 

5.4. Proof of Theorem |4.2[ We divide this proof into two parts. First, we prove tech- 
nical lemmas allowing to use Lemma 4.1 Then, we prove the theorem. 

5.4.1. Technical lemmas. Using the quantity iiUz.^) defined by (11), we define 

0[O] := sup{j < : ^U^) < i - j , i = j, . . . , 0}. (21) 

Lemma 5.5. 0[O], defined by (J2lj), belongs to 9 (1) [0] n 9 (1) ' [fcl [0] for any k > and 

U U {^H.Vl}^^])^^^) - P ( U ^( C/ -oo)>^}]- (22) 



a i= 



Proof For any U_ k , the way the sets of strings {z_F}J k _ 1 , (z, U_ k )}z and Jk(U_ k ) are 
defined ensure that the former is included in the later. It follows that, for any U~u, 

A k (UZl):= _ t inf VinfP(a|x:^)<^infP(a|F { « ) _ 1} (z,C/r fe 1 k)- 



x_ h :xeJ h (U_ k ) aeA 



z 



As the inequality Aq < Y2aeA mi z -^( a U) * s a ^ so true, we deduce that, for any k > 0, any 

[/! M G [0, 1[" N and any z G A~ N ~ 



eiu ^) < k => lW (**f fe ,_ 1} U, f/r, 1 ), f/o 



< jfc . 



By recurrence, this means that, for all 0[O] < % < 0, ivJgi jife ^moi) does not depend on z. 
Hence, 0[O] is also a coalescence time for the update function F^\ that is 0[O] G 0W[O]. 
Observe that we have proved, more specifically, that 0[O] G O'^'fO], where G'^'fO] C 0^[O] 
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is defined by 13 using F^ 1 ' and L^ l >. By Lemma 5.6 below, this implies that 9[0] G ©O-M*] [0] 
for any k > as well. 

We now prove the second statement of the lemma. If there exist i > k, UZi and z 

such that Z/ 1 ) (zF}J i _-a (z, UZi ), Uq\ > k, then there exists some past a (take a = 

Z- F {-i -*-i}U> U -i~ X ) for instance) such that LW (aF^\ _ l} (a, Uzl), £/ ) > k. 
We now have the following sequence of inclusions 



u u { L(1) HSo]^i}^^ro])'^) >A; } 

a i=9[0] 

V(aFP- 

a i=0[0] 
a i=6[0] 



u u { l(i) H«Vi}^ ^ ^) > fe } n ^i°] < * - * - *> 



oF^ M -i } fe uizl), U t ) >k}n {0[O] <i-k-l) 



C U {'(*£oo)>*}- 

i=0 [0] 

This concludes the proof of the lemma. 



□ 



Recall the definition ( 13 ) of B'[0] for generic range partitions of a weakly non-null kernel 



P. We will need the following lemma. 

Lemma 5.6. For any k>0, 9'[0] C @W[0]. 

Proof. Let 0[O] G @'[0]. For any fixed k > 0, we separate two cases. 

(1) If 0[O] > —k, then, by the definition of 0'[O], the ranges used by F from 0[O] to are 



all smaller than or equals to k, and therefore using (18), we have that the length 
used by F^ ' in the same interval of indexes are the same and the constructed 
symbols are the same as well. Thus 0[O] G ©[ fc ][0]. 
(2) If 0[O] < —k, then, by the definition of O'[0], we can apply the same method as in 
the preceding case, and obtain that 0[O] is a coalescence time for F^ > for the time 
indexes from 0[O] up to 0[O] + k. But 0[O] is also a coalescence time for the time 
indexes from 0[O] + k + 1 up to 0, since the ranges used by F*- > are always smaller 
than or equal to k. Thus, in this case also, 0[O] G O^[0]. 

□ 



5.4.2. Proof of T heorem \4.£\ In the conditions of this theore m, b y Theorem 1 in |De Santis| 



& Piccioni (2010), 0[O] is P-a.s. finite. Moreover, by Lemma 5.5, 0[O] G 0W[O] n6^^^[0] 



for any k > 0. Thus we can apply Lemma 4.1 and obtain, using Lemma |5. 5 



d(X, Xl' 



< 



U(< 



[XT 



>k) 



.i=en 
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and moreover 

u {^u^iWIe 1 ^ 

K i=6[0] J \i=0[O] 

I ° 

< E ^2 1{£(U! 

\i=9[0] 

Consider the cr-algebra T^ generated by U°_ k , k > 0. Then, £(11^.^) is a stopping time 
with respect to Fk and, by definition, so is 8[0\. Moreover, £{U % _ 00 ) is independent of 

Uf + i by independence of the Uj's. Finally, by stationarity, £ (^Zioo) = £ {U-^Y, hence 




E(l{l(E/j ) > k} ) = E ( 1 {£ (UlJ) > k} ) , for any i G Z. By Theorem 1 in De Santis 



& Piccioni (2010), 9[0] has finite expectation, hence we can use Wald equality to obtain 

J(X,XW) < E\0[0]\F(£ (iP^) > jfe) . (23) 

This concludes the proof of Theorem |4.2| 



5.5. Proof of Theorem 4.3, Recall the definition of the set O'[0] given by (13). If 
9[0] G G'[0] and 9[0] > —k, then we are sure that the range L (a-P{0[o],i-i}(o, £^j™ ), UA <k 
for any i = 9 [0] , . . . , and any a, therefore 
o 
UU{ L (^Wol.-ilfe^fof),^) >k}c {9[0} < -k}. 

a i=8[0] 



By Lemma 5.6, any 9[0] G 0'[O] also belongs to ©' '[0] for any k > 0. We can thus apply 



Lemma 4.1 and conclude the proof of the theorem. 
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Appendix A. Local continuity with respect to the past 1 
In this section, we assume that A = {1,2}, and that P has only one discontinuity point, 



the point 1 = . . . 111. We refer the interested reader to Gallo & Garcia (2011 ) for examples 
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with countable alphabets, and discontinuities in more complicated set of pasts. To begin, 
we need the following definition. 

Definition A.l (Local continuity with respect to the past 1). We say that a kernel P on 
{1,2} is locally continuous with respect to the past 1 if 

Vt > 0, inf ^ inf P(o\ l*2«Cjz) 

converges to 1 as k diverges. We distinguish two particular situations of interest. 

• We say that P is strongly locally continuous with respect to 1 if there exists an 
integer function £ : N — > N such that 



Vi > 0, inf ^2 inf P(a\ l^aZlz) = 1 



-* a£A 



for any k > £{i), and 

we say that P is uniformly locally continuous with respect to 1 if 



orr. '■= inf inf > inf P(a\V2a ,z) 
K i>n „-i ^— ' z ~ K ~' 



(24) 



(25) 



-h a£A 



converges to 1 as k diverges. 

Strongly locally continuous kernels are known as probabilistic context trees, a model 
that have been introduced by Rissanen ( 1983 ) as a uni versal data co mpression model. It 
was first consider, from the "CFTP point of view", by Gallo (2011). The kernel P is a 
simple example which is strongly and uniformly locally continuous with respect to 1. 

Assumption 1: P is strongly locally continuous with respect to 1. 

Assumption 2: P is uniformly locally continuous with respect to 1. 

Notation A.l. Let us introduce the following notation. 

• Stationary chains compatible with kernels satisfying Assumptions i=l and 2 are 
denoted ~K^', and the corresponding canonical k-steps Markov approximations are 
denoted X®'W. 



We use the notations r. 



(0 

o 



ocq for i=l and 2, and for k > 1, 



rW:=r«V(l-(l-«(2)) rl(fc) ) 

rf := rd V (1 - (1 - 4)/a(2)) 

where £ and ar are the parameters of the kernels under assumptions 1 and 2 

respectively. 

For i=l and 2 



4 ] ■■= E E 

1=1 h,...,tj>i m=1 

t 1 + ... + t j = k 

where Y\7=o - =1 - 

And finally, for any k > 0, let 

Lfea(2)/2j 

u, ■- [ka(2)/2\F | Yl ^ 



n (i -£-1)11 



.(0 



(26) 



z=o 



LAra(2)/2j 



a 2 



> fc/2 



(27) 
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It is well-known that this sequence goes exponentially fast to (see Kallenberg 
(2002) for instance). An explicit upper bound is derived in Appendix]^ 



Corollary A.l. Under the weak non-nullness assumption, we have for i=l and 2 that, if 



Efc>i n?=o r k = °° 



d(X«,X«'M )<«* + « 



(0 

|fca(2)/2j 



0. 



(28) 



C.l 



The quantity defined on display (26) is related to the house of card process presented 



in Section [En (see equation (30)). We provide in Propositions B.l and B.2 explicit upper- 



bounds on the term (26) that can be plugged in (28). The term (27) is studied in Corollary 



It follows in particular from these propositions that, whenever r, is not exponentially 



«: c 



,(') 



decreasing, the leading term in (28) is Vu , and therefore, we obtain for some constant 
C > 1 and any sufficiently large k 



J(X«,xW'W)<C« (i) 



B.2 



states that, if 1 



[ka(2)/2\ ■ 

,00 



For instance, Proposition 

{ s fc}fc>i is any summable sequence, we obtain for some constant C > 1 



d(X«,X«-M) <c 



I + Sk, k > 1 with r G (0, 1) and 
con 

(logfc) 3+r 



fc2-(l+r)2 



(29) 



Proof. Under Assumptions 1 and 2 with weak non-nullness, Gallo & Garcia (2011) con- 
structed a set of range partitions generating a set of coalescence times [0] which is a.s. 
non-empty. This is what is stated in Corollaries 6.1 and 6.2 (and the discussions follow- 
ing them) for respectively Assumption 1 and 2. They defined a random time A^'[0] (see 
display (34) therein) which belongs to O'[0], as stated by Lemma 8.1 therein. They also 

prove that P(AW[0] < —k) is upper bounded by u k + v \ka{2)/2\ ( wnere i u k }k>i has been 
defined by (|27j)) . Th is is in fact stated in the proof of item (ii) of Theorem 5.2 therein. 
By Theorem 4.3, these upper bounds are therefore upper bounds for the d-distance 

J(X,XW). 

□ 



Appendix B. Some results on the House of Cards Markov chain 

Fix a non-decreasing sequence {rk}k>o of [0, 1] -valued real numbers converging to 1. 
The house of Cards Markov chain H = {H n } n >o related to this sequence is the N-valued 
Markov chain starting from state and having transition matrix Q = {Q(i,j)}i>oj>o 
where Q(i,j) := r, L l{j = i + 1} + (1 — Tj)l{j = 0}. Let us denote v^ '■= Pr(-^fc = 0)) the 
probability that the house of cards is at state at time k. We want to derive explicit rates 
of convergence to of this sequence when H is not positive recurrent. These results will 
be used in the next section in order to obtain explicit upper bounds for d(X, X^) under 
several types of assumptions. Decomposing the event {H^ = 0} into the possible come 
back of the process {-H^}^=o,...,it to yields, for any n > 1 



Vk ■- 



E 



E lid- 


Cm ^ 

-n m -i) Yl r i 


tl, . . . ,tj > 1 m 


1=0 


i + ...+tj = k 





(30) 
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where Y1^ = q := ■"•■ Although explicit, this bound cannot be used directly and has to be 
simplified. As a first insight, we borrow the following Proposition of |Bressaud et al. ( 1999 ). 

Proposition B.l. (i) Vf. goes to zero as k diverges i/^ m >illt:0 r i = +°°; 
(ii) Vk is summable in k if 1 — r k is summable in k, 

(Hi) Vk behaves as 0(1 — rk) ifl — ru is summable in k and supj limsup fc _> +00 ( j— p-) < 1 
(iv) Vk goes to zero exponentially fast if 1 — rk decreases exponentially. 



As observe in Bressaud et al. (1999), the conditions of item (iii) are satisfied if, for 
example, 1 — r k ~ (log k) v k~ < > for some £ > 1, and for any n. However, this is one of 
the only cases in which this proposition yields explicit rates. In the present paper, we will 
prove the following proposition. 

Proposition B.2. We have the following explicit upper bounds. 

(i) A non summable case: if 1 — r k = £ + s k , k > 1 where r G (0, 1) and {s n } n >i is a 
summable sequence, there exists a constant C > such that 

(ii) Generic summable case: if t^ := nfc>o rfc ^ ®> then 

v k < v inf , { K 2 (l - r k/K ) + (1 - too)* } ■ 

(Hi) Exponential case: if 1 — r k < C r r , k > 1, for some r E (0, 1) and a constant 
C r £ (0,log£) then 

v k < Ue C ^r) k . 

B.l. Proof of Proposition B.2| , Before we come into the proofs of each item of this 
proposition, let us collect some simple remarks on the House of Cards Markov chain. 

Let {Tk}k>o be a sequence of the stopping times defined as To := and, recursively, for 
any k > 1, Tk := inf {/ > Tk-i + 1 s.t. Hi = 0}. The Markov property ensures that the 
random variables I k '■= T k+ i — T k are i.i.d., valued in N* and it is easy to check that 

fc-2 



Vfc>l, t k := Pr (T = k) = (1 - r fc _i) J] r 

i=0 

where YY[ = q '■= 1- We have, for any n > 0, 

Pr (H n = 0) = Pr (3k > 0, s.t. T k = n) = ^ Pr (T, 



n) 

k=0 



We write Tk = X^=o h' ^ s a ^ *he ^ — ■*■' we have Pr (Tk = n) = for all k > n. 
Therefore, for all K S [l,n], 

n K n 

Pr(fl n = 0) = 5^Pr(T fc = n) = ^Pr(T fc =n)+ £ Pr (T k = n) . (31) 

k=0 k=0 k=K+l 
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Fact B.l. Let K G [l,n], we have Pr (V7 G [1,-^T], h < n) = (1 — fn+i)^. /n particular, 
if K G [l,re], i/ien 

PrMjG^n], s.t. £/, = re ) < Pr (VJ G [1, if], I, < re) 



«=o 



;i - ^ n+ i) K 



In order to control Y2k=o ^ r (-^fe = n ) = Pr (3/c = 0, . . . , K, T^ = re), we can simply re- 
mark that, if there exists k G 1,...K such that ^i=i -^ = n, there exists necessarily 
i G [1, -RT] and r G [1, . . . , K\ such that Ii = n/r. This implies that 

/ n N 

Pr(3/c = 0, ...,if, T fc = re) < Pr [3i e [1,K], 3r G [1, . . . , if], s.t. I { 



r 

i=\ j=l 
We have obtain the following result. 
Fact B.2. Let K G [l,n], we have 

K 

J2^(Tk = n)<K 2 t n/K . 

k=0 

Restricting our attention to the summable case (that is, when ^2 k>0 (l — r^) < +oo), 
the following fact is fundamental. Its proof is immediate. 

Fact B.3. If ^ n>0 (l — i~ n ) < oo ; then too := Pr (ii = oo) = nSo ri > ®> ^ n po-^icular, 
u n := Pr (7i > n) > ^ > 0. Moreover, for all n G N, £<x>(l — r n ) <t n <(l — r n ) 

Using Facts [bTT B.2 and B.3, we are ready to prove items (i) and (ii) of Proposition 

El 



Proof of Item (i) of Proposition B.2, As far as we know, all the results on the house of 
card process hold in the summable case. When ^fceN(l ~~ r k) = °°> it is om y known that 
^ neN Pr(ff n = 0) = oo. It is interesting to notice that we can still obtain some rate of 
convergence for Pr (H n = 0) from our elementary facts, at least in the following example. 
Let us assume that there exists r < 1 and a summable sequence s n such that, for all n > 1, 
1 — r n = ^ + s n . In this case, we have X^neN^ ~~ r «) = °°> therefore too = 0. Nevertheless, 

n n 

Jin < n e " (1_r8) = e- rln "+°W < Cn~ r . 

i=0 i=l 

Therefore t n < Cn~^ 1+r ' . Moreover, using the inequality (1 — re) > e~ u ~ u , valid for all 
u < 1/8, we see that t n > cn~( 1+r > . Therefore, v n = ^fc>n^fc — cn ~ r ■ It follows from Fact 



B.l| that, for large K and re, 

n 

Y, Pr ( T k = n)<(l- v n +i) K < e~ cKn ~ 

k=K+l 



Using Fact B.l, we also have 

K 

Y Pr (T fc = re) < CK 2 t n/K < CK^n^ 1 ^ . (32) 



fc=0 
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We deduce then from (31) that, for all K € [0, n], 



Pr (H n = 0)<C 



K 3A 



■n 



l+r 



+ e 



-cKn 



For K = 2n r Inn, we obtain 



Pl(H „ =0)<c (^f; =c Q^;. 

^ I - n l-2r-r2 n 2-(l+r)2 

If < r < 1, we have 2 — (l + r) 2 > 0. This bound may not be optimal, but it is interesting 
to see that we still can derive rates of convergence from our basic remarks even in this 
pathological example. □ 



Proof of Item (ii) of Proposition B.2. We deduce from Facts B.l and B.3 that, in the sum- 
mable case 



Y, Pr(T fe = n)<(l-£ c 
k=K+l 
Therefore, from Facts IB. 21 and IB. 31 



\K 



Pr (H n = 0) < inf { K\l - r n/K ) + (1 - too) X } 

K=l,...,n 



(33) 

□ 



Proof of Item (Hi) of Proposition B.2. In this section, we assume that, for all k, 1 — r k < 
C r r , for some r S (0, 1) and a constant C r > 0. In that case, for all k, we have, by 
independence, 



p*(X> = ") = E p Mf> = M 

\l=l ) h+...+i k =n \l=l ) 



= e n pr ^=^) 

i\+...+ik=n 1=1 

< Y C^.r il+ - +lk = Cfr 11 Y l ■ 

i 1 -\-...+i k =n ii+...+i k =n 

Let us evaluate the numbers pk )n = Yli + +i =n ■"•■ We have pi )U = 1 and 

n— fc+1 n— fc+1 

Pk,n =^^12 E l = E PhlPk-l,n-l 

1=1 ik=l ii+...+ifc_i=n— I 1=1 

n-k+1 

= 2^i Pk-l,n-l ■ 
1=1 

Let us then assume that, for some k, we have, for all n > k — 1, Pk-i,n < n /(k — 2)!. 
Notice that this is the case for k = 2, then, for all n > k, 



k-l 



^ »^+l (ra _ l)k -2 n-1 lk _ 2 

Pk ' n ~ U (k-2)\ = i 2^_ i (k-2)l-J k _ 1 (k-2)\-(k-l)l 



n (fe-2) 

< 



We deduce that 



(C r n) 



fc=l ii+...+i fe =n r fc=l ^ ' 



fe— 1 p C r n 

< 



O /- 
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Therefore, 



n 1 

Pr(H n = 0) = ^Pr(T fc = n) < — ( 



fc=i 



e Cr r) 



Hence, when C r < ln(l/r), e r r < 1 and Pr (H n = 0) decreases exponentially fast. □ 

Appendix C. Concentration of geometric random variables 

Let £,£i:n bei.i.d. geometric random variables with parameter a, i.e., V/c>l,P(£ = /c) = 
(1 — a) a. We obtain in this section the following upper bounds. 



Proposition C.l. let C ha = ^ + 4 (^f) 2 , C 2 , Q = In ( -f=^ A 2 ) . T/ien, Vx > 0, 



n 



£** 



> x < e 



-//[ ^^l\%^x 



n 



a 



(34) 



E* 



< —x < e 



n ^— A^ 



i=l 



Q: 



As a corollary of this result, we obtain the following bound when n = \ka/2\ and 
x = l/a. 

Corollary C.l. Let fc € N*, a € (0,1), n = [ka/2\, x = k/(2n) > l/a, £ 1:n be i.i.d. 
random variables with parameters a, and 



Uk := nP 






n 



a 



> nx 



T/jen, we have, for C^^ a := 4 / 1 _ a w 4 _ 3a - ) A | In ( 2 ? 1 _'^x A 2 J , /or all e > and a// /c > fc(e), 

n fc < ae-^ C[i ' a -^ . 

C.l. Chernov's bound. Let Y,Y\- n be i.i.d. random variables such that Va < A < b, 

E(e Ay ) < oo, then, 



Vx>0, 



l± Y < 



Xx 



i=l 



> x < inf e~ AX ( E ( e» 

/ na<A<n6 



Ay 



(35) 



Proof. We have, by independence of the Yi and Markov's inequality, for all na < A < no, 



£« 



> x 



£?=i* > e A -l < e -*TE (e££2=i* ) = e - A * f E f e^ yN " 



i=i 



D 



C.2. Exponential moments of geometric random variables. Let £ be a geometric 
random variable with parameter a, then 

(36) 



VA<-ln(l-a), E ( eA? )<T^ 71 
VA>ln(l-a), E (e A (~«)< ^ 



-A 



a e 
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Proof. By definition, we have, VA < — ln(l — a), 



E (e 



,A£ 



£e»(l 



a) fe x a = ae A 



fe>l 



fc>0 



E a-«) 



A 



ne 



A 



1- (l-a)e> 



Moreover, for all A > ln(l — p), 



Efe- A « 



-A 



E(a 



ae y I '■ ' ~~ " > r 

k>0 



-X 



a e 



l-(l-a)e 



-A - 



C.3. Proof of the deviation bounds. Plugging (36) in (35), we obtain, for all A < 
— nln(l — a), 



V n fe a J 



<e 



-A("i+^ 



ae 



A/n 



I 1 - (1 - a)e A /« I 

Q n e _A ( l+ x - l ) e - n H l-(l-")e A/n ) 



Choosing A = ne for e < In ( 2 (i- < a) ^ ^ ) > usrn § the inequalities e e < 1 + e + e for all 
e < hi 2 and — ln(l — u) < 1 + u + ti 2 when u < 1/2, this last bound is equal to 

ae -<i+»-l) e -l*(l-(l-«)e«)V 



< ae 



(I +X _l) Ma )-l n (l-^(e 6 -l))A n < /,(!+,_!) ii i -l( e ._i)+(il^l ( e«-i)) 



< e 



-ne z— e 



K^) 2 )) 



Let C a = ^— + 4 ( i^s ) ; choosing e < x/(2C a ), we have x — eC^ > x/2, hence, choosing 



e = 2^r- A In f 2 (i- < a) A 2 J, we conclude the proof. Plugging (36) in (35), we obtain, for all 
A > nln(l — a), 






+ X 



<e 



<-h+*) 



o 



a e 



-A/n 



1 - (1 -a)e~ x / n J 
= a " e -*(-£+ a: + 1 )e _nln ( 1 -(i-<*)e- A/n ) 

Choosing A = ne, with e < 1, this last bound is equal to 



ae -<-£+*+i) e -in(a)-in(i-( e -<-i)i^) 



< e 



3 -e(-i+.+l) p (e--l)i^ + (( e --l)^) S 



< e 



<4+-+i) P (-+a^+(H+ £ 2 )^) i 



<e 



n\ ex—e 



: (^+ 4 (^) 2 )] 



l-a\2 



Let C a = ^^ + 4 ( —^ ) , choosing e < x/(2C a ), we have x — eC Q > x/2, hence, choosing 
e = 2§~ A 1, we conclude the proof. □ 
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